(19) 



J 



(12) 



(43) Date of publication: 

06.08.1997 Bulletin 1997/32 



Europdisches Patentamt 
European Patent Office 
Off Ice europ^en des brevets (11) EP 0 788 052 A1 

EUROPEAN PATENT APPLICATION 

(51) IntCI.^: G06F 11/14 



(21) Application number: 97100684.6 

(22) Date of filing: 1 7.01 .1 997 



(84) Designated Contracting States: 


• Sakuma, Takesfii 


DE FR GB 


1-1 Shibaura 1-cliome MInato-ku Tokyo 105 (JP) 




• Sakai, Hiroshi 


(30) Priority: 31.01.1996 JP 15352/96 


1-1 Shibaura l-chome MInato-ku Tokyo 105 (JP) 


(71) Applicant: KABUSHIKI KAISH A TOSHIBA 


(74) Representative: Henkel, Feller, Hdnzel & Partner 


Kawasakl-shI (JP) 


Mdhlstrasse 37 




81675 MQnchen (DE) 


(72) Inventors: 


• Hoshina, Satoshi 




1-1 Shibaura 1-chome Minato-ku Tokyo 105 (JP) 





(54) I/O control apparatus having check recovery function 



(57) In a computer system, when a CPU (1a. lb) 
performs state setting of an operation mode or the like 
to 1/0 devices (4a. 4b). the log data of the state setting 
is stored in a set log storage area. Upon occurrence of 
a fault in the computer system, the 1/0 devices are 
cleared, and state setting of the 1/0 devices is per- 



formed on the basis of the log data of the state setting 
stored in the set log storage area (34). Therefore, the 
states of the 1/0 devices can be recovered to a state at 
a checkpoint when the process is restarted. 
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Description 

The present invention relates to an I/O control 
apparatus adapted to a computer system having a 
checkpoint recovery function. 

In recent years, computer systems have considera- 
bly been developed. With this development, request to 
reliability such as coping with a fault has become strict. 
As one method for constituting a fault tolerant computer 
system, there is a checkpoint recovery scheme. 

According to a method for implementing a check- 
point recovery scheme, the internal state of a CPU, 
namely, the contents of the registers and the cache 
memory of a CPU are periodically saved in a main 
memory to acquire a checkpoint on the main memory. 
When a data processing cannot be continued due to a 
fault in the computer system, the main memory is 
restored to the state of the most recent checkpoint, and 
the data processing is restarted using the internal state 
of the CPU stored in the main memory. 

A method for restoring the main memory to the 
state of the checkpoint is as follows. In an update oper- 
ation of a main memory, the address and data to be 
updated are stored in a memory state recovery unit 55. 
Upon occurrence of a fault in the computer system, the 
main memory is written back with the before data stored 
in the memory state recovery unit 55. 

Though, in this checkpoint recovery scheme, upon 
occurrence of a fault in the computer system, the inter- 
nal state of the main memory or CPU can be restored to 
the state of the most recent checkpoint by using the 
memory state recovery unit 55, an I/O device connected 
to the computer system cannot be easily restored to the 
state of the most recent checkpoint 

This problem will be described below with reference 
to FIGS. 1 and 2. 

As shown in FIG. 1 , in this computer system, a CPU 

51 requests a disk controller 52 to access a disk 53 to 
perform an I/O operation. FIG. 2 shows a timing dia- 
gram of the I/O processing of the conrputer system hav- 
ing the above arrangement. 

As depicted in FIG. 2, registers of the disk controller 

52 are set to read data from a predetermined position of 
the disk 53 at times TO to T1 ((1) in FIG. 2), and the disk 
controller 52 is started at time T1 ((2) in FIG. 2). In this 
manner, the disk controller 52 and the disk 53 execute a 
read operation at times T1 to T2 ((3) in FIG. 2). The read 
data are transferred into the main memory 54 by DMA 
transfer from the disk controller 52. 

The CPU 51 receives a completion inten-upt from 
the disk controller 52 at time T2 ((4) in FIG. 2). thereby 
performing a completion interrupt processing to the disk 
controller 52 at times T2 to T3 {(5) and (6) in FIG. 2). 
Another post processing with respect to the read opera- 
tion is performed at time T3 to T4 ((7) in FIG. 2). 

The first difficulty in this case is that a checkpoint 
acquired at an arbitrary timing is not always valid. 

For example, assume that a checkpoint is acquired 
in the middle of setting the resisters of the disk controller 



52 (the setup sequence between times TO and T1 .) 

In this case, upon occurrence of a fault of the com- 
puter thereafter, a latter part of the setup sequence is 
re-perfbrmed from the most recent checkpoint, namely 

5 only a part of the registers of the disk controller 52 are 
set again. For this reason, the disk controller 52 does 
not always operate desirably. 

In consideration of- the characteristics of the disk 
controller 52, not only at times TO to T1 described 

10 above, but also at times TO to T3, i.e., when the CPU 51 
acquires a checkpoint during a setup sequence for an 
I/O operation such as a read/write operation, the disk 
controller 52 does not always operate desirably when a 
latter part of the setup sequence is re-performed from 

IS the checkpoint after a fault occurs in the system. 

One method to solve the difficulty is ihaX a check- 
pointing must not be pecformed during a setup 
sequence of an I/O device. However, in a computer sys- 
tem in which many I/O devices are incorporated, the 

20 CPU almost always performs setup sequence of an I/O 
operation. Therefore, it may lead to a considerable per- 
fbmiance degradation to prevent a checkpointing during 
a setup sequence of an I/O device. 

The second difficulty is as follows. Assume a fault 

25 occurs in the system during a DMA transfer from the 
disk controller 52 to the main memory 54, In this case, 
ongoing DMA transfer must be stopped before the main 
memory 54 is restored to tiie state of tiie most recent 
checkpoint 

30 In a conventional computer system, in order to stop 
ongoing DMA transfer, it is necessary to initialize (reset) 
tiie I/O device. Since the I/O device is set in an initial 
state by initializing the I/O device, a special process is 
required to restore the I/O device to the state of tiie 

35 most recent checkpoint. ^ 

fi<s a scheme for solving the problem of an I/O 
processing in the above checkpoint recovery scheme, 
the following two schemes are known. 

The first scheme is disclosed in USP-4740969 

40 "METHOD AND APPARATUS FOR RECOVERING 
FROM HARDWARE FAULTS". In a normal data 
processing, the data of read/write of the registers of an 
I/O device, tiie inten-upt from the 1/0 device are 
recorded in a log memory. When a register setup 

45 sequence is restarted from tine most recent Checkpoint 
after a fault occurs in the conrtputer system, tiie 
read/write operations performed to the registers of the 
I/O device before tiie fault occurs are re-performed as 
follows. For a write operation, the data is discarded and 

50 not written to the registers of tiie I/O device. For a read 
operation, instead of reading out from the register of tiie 
I/O unit, the data in the log memory is returned to tiie 
CPU. For an interrupt from the I/O device, tiie interrupt 
is generated and sent to tiie CPU at tiie same timing as 

55 in the preceding execution. 

This scheme requires a special interface circuit 
which is not provided to an ordinary computer system. 
Moreover, it Is difficult to apply tiiis scheme to a multi- 
processor system. 
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The second scheme is disclosed in Sequoia: A 
Fault-tolerant Tightly Coupled Multiprocessor for Trans- 
action Processing, IEEE Computer, February 1988. In 
this scheme, data processing in a computer system is 
divided into data processing portion which can be per- 
formed by using only CPUs and a main memory and an 
I/O processing portion which handles I/O devices. 
These portions are executed by different computers. 

FIG. 3 shows the schematic arrangement of a com- 
puter system in which a data processing in the compu- 
ter system is divided into a portion performed by only 
access to a main memory and a portion including 
access to the I/O device, and the former is executed by 
a computer 100 whose reliability is improved by the 
checkpoint recovery scheme, and the latter is executed 
by a computer 200 which does not use the checkpoint 
recovery scheme. In the logical interface between these 
portions, a request representing "read the designated 
amount of data at the designated position of the desig- 
nated disk" is sent from the computer 100 to the compu- 
ter 200. When the computer 200 actually has read data, 
a termination code indicating whether the operation is 
normally completed or not and the data read from the 
disk are returned from the computer 200 to the conrpu- 
ter 100. 

To improve the reliability of the computer 200, the 
constituent elements of the computer 200 are dupli- 
cated. Namely, the computer 200 consists of computer 
main bodies 210a and 210b and I/O devices 220a and 
220b. In a normal state, the request is simultaneously 
processed on both sides, and the execution results are 
compared witii each other to check whether the execu- 
tion results are identical. If a fault occurs on one side, 
the requested operation is continuously performed on 
the remaining side. 

This scheme has the following disadvantage. That 
is, since at least two types of computers must be pre- 
pared, the computer system is large and costly. 

The following idea would be thought of from the 
second scheme. That is, the computer 100 and the 
computer 200 may be implemented by one computer by 
using a virtual computer technology. However, this idea 
does not work well because of the following reason. 

The scheme disclosed in Sequoia is based on the 
following assumption. Since the independent comput- 
ers 100 and 200 are used, even if the data processing of 
the computer 100 is restarted from a checkpoint due to 
occurrence of a fault within the computer 100, the I/O 
processing of computer 200 is not influenced by tiie 
fault. 

However, if the computer 100 and the computer 200 
were implemented on one computer by using the virtual 
computer technology, tiie computer 100 and the compu- 
ter 200 would be simultaneously influenced by a fault 
occurring in the base computer system. - 

As described above, a checkpoint recovery compu- 
ter system needs a special treatment of the I/O process- 
ing portion. A metiiod of arranging a special interface 
between the CPU and the I/O device, or a method of 



separately performing a calculating portion and an I/O 
processing portion on two independent computers are 
employed. Therefore, the cost is considerably 
increased. 

5 It is an object of the present invention to provide an 
I/O control apparatus capable of controlling an I/O 
device on one computer having a checkpoint recovery 
function witiiout requiring a special interface circuit or 
requiring two independent computers. 

70 Another object is to provide a software layer 
between an operating system kernel and an existing 
device driver which restores tiie state of ttie I/O devices 
when the conrtputer system rolls back upon a fault. 
According to the present invention, an I/O control 

15 apparatus in a computer system which has one or more 
CPUs, a main memory, and one or more I/O devices 
and in which the CPUs periodically save tiie internal 
state of the CPUs and the contents of the main memory 
as a checkpoint, and the internal state of the CPUs and 

20 the contents of the main memory of tiie most recent 
checkpoint are restored when a fault occurs in the com- 
puter system to restart data processing, comprising: I/O 
device state storing means for storing log data of state 
setup of tine I/O devices performed by ttie CPUs; and 

25 1/0 device state restoring means for restoring the state 
of the I/O devices to ttiat of ttie most recent checkpoint 
by first initializing ttie I/O devices and second replaying 
state setup according to tiie log data stored by the I/O 
device state storing means. 

30 Accofding to this invention, when state setup such 
as operation mode setup is performed by ttie CPU to an 
I/O device, the log data of the state setup is stored in, 
e.g., a main memory. Upon occun-ence of a fault in ttie 
computer system, an I/O device is initialized by an ini- 

35 tialize command or a reset signal assertion, and then 
ttie state setup sequence is replayed for the 1/0 device 
according to ttie log data, so ttiat ttie state of ttie I/O 
device is restored to ttie state of ttie most recent check- 
point 

40 it is often the case that a part of ttie log data 
becomes unnecessary because of new state setup and 
therefore the unnecessary part can be eliminated. For 
example, assume ttiat an I/O device which has initial 
state of "state A* is set to "state B". and n^ set to "state 

45 C", and then a checkpoint is acquired, in tiiis case, upon 
ttie setting up of "state C", ttie log data for "state B" 
becomes unnecessary and can be eliminated. And 
upon tiie checkpoint acquisition, all the log data other 
than "state C" setting up becomes unnecessary and 

so can be eliminated. Therefore, a means for eliminating 
ttie unnecessary part of the log data is provided, 
whereby the area required for the log data can be 
saved. In addition, the time required for replaying ttie 
• state setup sequence after the I/O device initialization 

55 can be reduced. 

For an 1/0 device to which new state setup has not 
been performed since the preceding checkpoint, initial- 
izing tiie I/O device and replaying state setup sequence 
need not be performed upon occurrence of a fault. For 
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this reason, means for skipping the initialization and 
state setup sequence of such an I/O device is arranged. 
The time required for recovery can be further short- 
ened. 

TTiis invention further comprises request block cre- 
ating means for creating, when an application process 
in the computer system makes an 1/0 request a request 
block in the main memory which contains information 
necessary to perform the I/O request: I/O execution 
processes for performing I/O operations by accessing 
the 1/0 devices according to request blocks; and I/O 
execution process initializing means for initializing, upon 
restart from the most recent checkpoint which fbllows a 
fault occurrence, the ongoing 1/0 execution processes 
and causing the I/O operations being performed by the 
I/O execution processes to be performed again from the 
beginning. 

According to the present invention, when an appli- 
cation process makes an I/O request, a request block 
which contains information necessary for the I/O opera- 
tion is created and executed by an I/O execution proc- 
ess. The application process moves in wait state until 
the end of the I/O operation. 

Assume a fault occurs in the computer system dur- 
ing the I/O operation. While the state of the computer 
system rolls back to the most recent checkpoint I/O 
device state restoring means restores the state of the 
I/O device. Upon restart from the most recent check- 
point the I/O execution process initializing means initial- 
izes the I/O execution process responsftsle for the I/O 
operation, and causes the I/O operation performed half- 
way to be performed again from the beginning. In the 
restart phase, an I/O execution process simply performs 
the I/O operation according to the request block. 

The fact that an I/O operation is performed by an 
I/O execution process, not by the application process 
itself enables the I/O operation to be restarted from the 
beginning. If the application process did the I/O opera- 
tion as in a conventional way. it would be difficult or 
impossible to restart the I/O operation from the begin- 
ning. It should be noted that at the most recent check- 
point, the I/O operation is being performed halfway. 

Of the request blocks stored in the main memory, 
request blocks which were aeated before the most 
recent checkpoint should be processed by I/O execution 
processes, and the execution of request blocks which 
were created after the most recent checkpoint should 
be postponed until the next checkpoint 

Generally, if a rollback occurrs, the second time 
data processing from the most recent checkpoint is not 
always the same as the first time data processing 
because of the real time clock and asynchronous events 
(i.e. external interrupt). Therefore, an I/O request made 
by the data processing before fault occurrence may not 
be made or may be made differently by the second time 
data processing after the fault recovery Therefore, it is 
necessary to postpone the execution of request blocks 
created after the most recent checkpoint until a new 
checkpoint is acquired. 



When a checkpoint has been acquired, many 
request blocks turn executable. Therefore it is efficient 
to allocate, upon a checkpoint acquisition, many CPUs 
to I/O execution processes so that I/O operations are 

5 performed with small delay. 

To make the CPU which has executed the applica- 
tion process which requires an I/O operation also per- 
form the I/O execution process which is responsible for 
the I/O operation, leads to an increase in cache hit ratio. 

10 To determine tiie number of CPUs allocated to I/O 
execution processes properly, depending on the 
number of request blocks to be processed, improves the 
system performance. 

Assume that while an 1/0 execution process is exe- 

15 cuting a device driver routine which outputs a character 
string to a printer unit a fault occurs in the computer 
system. Then, only a part of the characters may be 
printed out on the paper and cannot be erased. In this 
case, the application process which made the 1/0 

20 request should receive an error reply so that the appli- 
cation process may do an error recovery at application 
level like a printer jam error. 

Assume that an I/O execution process completes 
the execution of a device driver routine which outputs a 

25 character string to a printer unit and tiien a fault occurs 
in the computer system. In this case, the whole charac- 
ter string has been printed out. If the request block was 
executed again after the fault recovery, it would results 
in duplicated print Therefore, it is desirable that the 

30 application process which made tiie I/O request even 
when a fault occurs in the computer system, receives an 
successful I/O completion reply without re-executing the 
I/O request, in case that the 1/0 request is an output 
request and has been completed before the fault 

35 occurs. 

This invention can be more fully understood from 
the following detailed desaiption when taken in con- 
junction with the accorrpanying drawings, in which: 

40 FIG. 1 is a view showing tiie arrangement of a com- 
puter system using a conventional checkpoint 
restart scheme; 

FIG. 2 is a timing chart in an I/O process of the 
confiputer system shown in FIG. 1 ; _ 
45 FIG. 3 is a view showing an arrangem^ in which 
I/O control is implemented by a computer system 
using a conventional checkpoint restart scheme: 
FIG. 4 is a schematic view showing the arrange- 
ment of a computer system according to tfie first 
50 embodiment of the present invention; 

FIG. 5 is a flow diagram of a config routine in the 
first embodiment: 

FIG. 6 is a flow diagram of a checkpoint acquisition 
in the first embodiment; 
55 FIG. 7 is a flow diagram of a fault recovery in the 
first enobodiment; 

FIG. 8 is a flow diagram of a config routine in the 
first embodiment: 

FIG. 9 is a flow diagram of a checkpoint acquisition 



45 



50 



4 



EP0 788 052 A1 



8 



in the first embodiment; 

FIG. 10 is a flow diagram of a fault recovery in the 
first embodiment; 

FIG. 11 is a flow diagram of an application process 
which makes an I/O request in the second embodi- s 
ment of the present invention; 
FIG. 12 is a flow diagram of an I/O execution proc- 
ess in the second embodiment; 
FIGS. 13A through 13D show how I/O operations 
are performed by application process and I/O exe- io 
cution process in the second enribodiment; 
FIG. 14 is a flow diagram of a checkpoint acquisi- 
tion in the second embodiment; 
FIG. 15 is a flow diagram of a fault recovery in the 
second embodiment; is 
FIG. 16 is a flow diagram of an application process 
which makes an I/O request in the second embodi- 
ment; 

FIG. 17 is a flow diagram of an I/O execution proc- 
ess in the second embodiment; . 20 
FIG. 18 is a flow diagram of a checkpoint acquisi- 
tion in the second embodiment; 
FIGS. 19A through 19E show how I/O operations 
are performed with delay by the application proc- 
esses and the I/O execution processes in the sec- 25 
ond embodiment; 

FIG. 20 is a flow diagam of a fault recovery in the 
second embodiment; 

FIG. 21 is a flow diagram of an application process 
which makes an I/O request in the third embodi- 3o 
ment of the present invention; 
FIG. 22 is a flow diagram of an I/O execution proc- 
ess in the third embodiment; 
FIG. 23 is a flow diagram of a checkpoint acquisi- 
tion in the third embodiment; 35 
FIG. 24 is a flow diagram of a fault recovery in the 
third embodiment; 

FIG. 25 is a flow diagram of an application process 
which makes an I/O request in the third embodi- 
ment; and 40 
FIG. 26 is a flow diagram of an I/O execution proc- 
ess in the tiiird embodiment. 

Embodiments of the present invention will be 
desaibed below witii reference to the acconrpanying 45 
drawings. 

(First Embodiment) 

TTie first embodiment of the present invention will so 
be desaibed below with reference to FIG. 4. FIG. 4 is a 
schematic view showing a computer system according 
to the first embodiment. 

As shown In FIG. 4, the computer system of this 
embodiment comprises- CPUs la and lb, a memory 55 
state recovery unit 2, a main memory 3, and I/O devices 
4a and 4b such as a printer and an RS232C controller. 

When ttie content of the main memory 3 is updated 
by the CPUs 1 a or 1 b, the memory state recovery unit 2 



holds the before image to restore tiie contents of tiie 
main memory 3. A detail of a memory state recovery 
unit 2 is described in C. Kubiak et al.. PENELOPE: A 
RECOVERY MECHANISM FOR TRANSIENT HARD- 
WARE FAILURES AND SOFTWARE ERRORS, FTCS. 
1982. The context of an application process including a 
stack area and data area is stored in the main memory 
3 as context information 31, Here, an application proc- 
ess means a process of a conventional computer sys- 
tem. 

The operating system 33, more specifically tiie 
printer device driver and RS232C device driver sets up 
operation mode such as a baud rate, a stop bit, a parity, 
and tiie like when tiie system is initialized or an applica- 
tion process requests. The set up operation mode is 
stored in a state setting storage area 34 as log data. 

For example, in a typical UNIX operating system, 
state set up sequence to an I/O device such as an 
RS232C controller is perfonned by a device driver rou- 
tine named xxconfig, the interlace of which is common 
to all the device drivers. Therefore, the parameters of 
the conf ig routine is preferably stored in the state setting 
storage area 34 of the main memory 3 at tiie entry of the 
config routine, and the parameters of the config routine 
can be recorded in tiie same way not depending on the 
I/O device type. 

FIG. 5 shows a flow diagram of a config routine for 
each I/O device. 

(1) Store parameters of the config routine in the 
main memory as a state setting up value (step Al). 

(2) Set up tiie state of the I/O device (step S2]. 

FIG. 6 shows a flow diagram of a checkpoint acqui- 
sition. 

(1) Save the intemal state of the CPU i.e.. tiie con- 
tents of registers and the cached data into the main 
memory (stepBI). 

(2) Clear data held in tiie main memory recovery 
unit 

FIG. 7 shows a flow diagram when the set up 
sequence is replayed from the most recent checkpoint 
upon occun-ence of a fault. " 

(1) Initialize tiie 1/0 device by a reset command or 
reset signal aissertion. 

(2) Restore the state of the main memory to tiie 
state of the most recent checkpoint by using the 
memory state recovery unit (step C2). As a result, 
the log of state setting up to the I/O device is 
restored to the state of the most recent checkpoint. 

(3) Execute tiie config routines again using the con- 
/ fig parameters stored in tiie main memory (step 

C3). This re-execution is performed from the oldest 
to the newest. As a result, the state of the I/O device 
is recovered to the state of the checkpoint. 

(4) Restart data processing which was being per- 
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formed at the checkpoint (step C4). 

Therefore, data processing is restarted in a state 
wherein the state of each 1/0 device is restored to the 
state of the checkpoint. This means that the checkpoint 
recovery mechanism and I/O device restoring mecha- 
nism are realized in a single computer 

For a RS232C controller, when operation modes 
such as a baud rate, a stop bit. and a parity are newly 
set, the old setting up values turn unnecessary There- 
fore, it is suffident that only the latest values of the oper- 
ation modes of the BS232C controller 4b are held in the 
state setting storage area 34 of the main memory 3 
(unnecessary log data is discarded). Then, the fault 
recovery time becomes reduced. 

A state holding area 36 is effectively arranged in the 
main memory 3 to manage the state setting flags of the 
I/O devices 4a and 4b. The state setting up flags are 
managed in the following n^nner The state setting up 
flag ON indicates tiiat some state setting up sequence 
has been performed or being performed to the I/O 
device since the most recent checkpoint and the state 
setting up flag OFF indicates that no state setting has 
been performed to the \J0 device since the most recent 
checkpoint. 

FIG. 8 shows a flow diagram of a config routine in 
case the above state setting up flags are employed. 

(1) Store, parameters of the config routine into tiie 
main memory (step 01). Turn on the state setting 
up flag of the I/O device. 

(2) Set up the state of the I/O device (step 02). 

FIG. 9 shows a flow diagram of a checkpoint acqui- 
sition in tills case. 

(1) Save tiie internal state of the CPU into the main 
memory (step El). 

(2) Turn off the state setting up flag of each I/O 
device (step E2). 

(3) Clear data held in the memory state recovery 
unit (step E3). 

FIG. 10 shows a flow diagram for fault recovery in 
this case. 

(1) If the state setting up flag of a certain 1/0 device 
is ON, initialize the 1/0 device since it implies that a 
new state has been set up since the most recent 
checkpoint (step F1). On the otiier hand, an I/O 
device with state setting up flag OFF does not need 
to be initialized. 

(2) Restore the state of the main memory to tiie 
state of the most recent checkpoint by using the 
memory state recovery unit (step F2). As a result, 
the log of state setting up to the 1/0 device is recov- 
ered to the state of the most recent checkpoint. 

(3) With respect to only an I/O device which has 
been initialized in step F1, execute the config rou- 



tines again by using the config parameters stored in 
tiie main memory (step F3). This re-execution is 
performed from tiie oldest to the newest As a 
result, the state of the I/O device is recovered to the 
5 state of the most recent checkpoint 

(4) Restart data processing which was being per- 
formed at the most recent checkpoint (step F4). 

In tills manner, the fault recovery can skip initializ- 
10 ing an 1/0 device with the recovery setting up flag OFF, 
which results in faster fault recovery. 

(Second Embodiment) 

IS It is assumed tiiat a computer system according to 
tills embodiment comprises, in addition to the arrange- 
ment of the computer system described in the first 
embodiment, an arrangement having a request block 
storage area 35 and I/O execution processes (FIG. 4). 

In this embodiment, when an application process 
makes a system call to request an 1/0 operation of an 
I/O device, the operating system, Instead of calling the 
device driver routine in tiie context of tiie caller process, 
calls a request block create routine. The request block 
create routine creates a request block having tine entry 
address of tiie device driver routine and tiie parameters. 
Here, the application (caller) process moves to wait 
state. The request block is simply held in the main mem- 
ory until a new checkpoint is acquired. Therefore, if 
there are a lot of application processes in the computer 
system, tiie number of the request blocks held in the 
main memory would increase as time goes. 

There are a certain number of I/O execution proc- 
esses in the system. An 1/0 execution process is a spe- 
cial process to execute device driver routines according 
to a request block. 

When a new checkpoint has been acquired, tiie 
request blocks become ready to be processed. /\n I/O 
execution process witii initial state is allocated to one of 
ttie request blocks. The 1/0 execution process executes 
the device driver routine with the appropriate parame- 
ters both of which are designated by the request block. 
Therefore, tfie number of I/O operations being per- 
formed concun-entiy depends on the number of 1/0 exe- 
cution processes. The I/O execution process moves into 
wait state when it invokes the 1/0 device within ttie des- 
ignated device driver routine. When tiie I/O device 
returns a termination inten-upt, the interrupt handling 
routine of the device driver is called and the result is 
reflected to the 1/0 execution process context. Then, the 
1/0 execution process turns ready. At tiie end of the I/O 
operation, the 1/0 execution process reports tiie result 
to the application process via the request block. 

If a fault occurs in tiie computer system during DMA 
transfer from an 1/0 device to tiie main memory, it is 
necessary to stop the DMA transfer by initializing the 1/0 
device before the main memory is restored. For this pur- 
pose, an in-operation flag is employed for each I/O unit 
with DMA capability in order to determine whether each 
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I/O device must be initialized or not. An in-operation flag 
is controlled to be ON only while the corresponding I/O 
device performs DMA transfer. 

FIG. 11 shows an I/O request flow diagram per- 
formed by an application process in this embodiment. 

(1) Store parameters relating to an I/O operation in 
the main memory as a request block (step G1). 

(2) Make transition to a wait state until the I/O oper- 
ation described in the request block is completed 
(step 02). 

(3) When the application process resumes its exe- 
cution, perform a completion step at the application 
process side relating to the I/O request with refer- 
ence to a result code field of the request block and 
then execute a succeeding step (step G3). 

FIG, 12 shows a flow diagram of an I/O execution 
process. Here, it is assumed that a multiplicity of 1/0 
execution processes are executed concurrently. 

(1) Wait for an executable request block (initial 
state, step HI). 

(2) Set up registers of the 1/0 device and turn on the 
tn-operation flag in accordance with the request 
blocK thereby starting the I/O device (step H2). 

(3) Upon recaving completion interrupt from the 1/0 
device, turn off the in-operation flag, perform a 
completion step of the I/O request write result code 
in the request block, and put the application proc- 
ess which has been in wait state into ready state 
(step H3). 

FIGS. 13A through 13D show a sample sequence 
of I/O operations performed by two application proc- 
esses and two I/O execution processes. 

When an I/O request is made by an application 
process, a corresponding request block is created and 
stored in the memory. (FIGS. 13A and 13B) After the 
request block is created, an 1/0 execution process (reg- 
ister setting up, starting, and completion intenrupt 
processing of the I/O device) is executed under the I/O 
execution process context (FIGS. 13C and 13D). 

FIG. 14 shows a flow diagram of checkpoint acqui- 
sition in this embodiment. 

(1) Save the internal state of the CPU into the main 
memory (step II). 

(2) Turn off the state setting up flag of each I/O 
device (step 12). 

(3) Clear data held in the memory state recovery 
unit (step 13). 

FIG. 15 shows a flow diagram of a fault recovery 
upon occurrence of a fault in this embodiment. 

(1) If the con-esponding state setting up flag of a 
certain 1/0 device is ON or the in-operation flag is 
ON, initialize the I/O device. Turn off the state set- 



ting up flag and the in-operation flag (step J1). 

(2) Restore the state of the main memory to the 
state of the most recent checkpoint by using the 
memory state recovery unit (step J2). 

5 (3) For only the 1/0 devices which have been initial- 
ized at step J1, replay state set up sequence from 
the oldest to the newest with reference to the log of 
the state setting up values held in the main memory 
(step J3). In this manner, the state of the 1/0 device 

10 is recovered to the state of the most recent check- 
point. 

(4) Initialize I/O execution processes. More specifi- 
cally, the I/O execution processes are set to step HI 
irrespective of the state of the 1/0 execution proc- 

15 ess at the most recent checkpoint (step J4). 

(5) Restart data processing which was being per- 
formed at the most recent checkpoint (step J5). 

It is important that, control of an I/O device is per- 
20 formed in the context of an I/O execution process, which 

is different from an application process, initialization at 

step J4 can be performed without any influence on the 

application process which requested the I/O operation. 

In the prior art, since an 1/0 operation is performed in 
25 the application process context, the above initialization 

wouki need a much complicated or ad hoc process. 
It is effective to add an execution permission flag to 

each request block. This execution permission flag is 

controlled such that the execution permission flag 
30 remains OFF until a new checkpoint is acquired, and is 

turned ON when a new checkpoint has been acquired. 
FIG. 16 shows a flow diagram performed by an 

application process when the execution permission flag 

is added. 

35 

(1) Store parameters relating to a unit I/O operation 
into the main memory as a request block (step K1). 
Turn off the execution permission flag of the request 
block. 

40 (2) Make transition to wait state until the 1/0 opera- 
tion designated in the request block is completed 
(step K2). 

(3) When the application process resumes its exe- 
cution, perform a completion step at t\e application 

43 process side relating to the I/O request with refer- 
ence to the result code field of the request block 
and then execute a succeeding step (step K3). 

FIG. 17 shows a flow diagram of an I/O execution 
so process. In this case, it is assumed that a multiplicity of 
I/O execution processes are executed concurrently. 

(1) Wait for a request block whose execution per- 
mission flag is ON (step L1). 
55 ; (2) Set up registers of the I/O device and turn on the 
in-operation flag in accordance with the request 
block, thereby starting the I/O device (step 12). 
(3) Upon receiving completion interrupt from the I/O 
device, turn off the in-operation flag, perform a 
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completion step of the I/O request, write the result 
code In the request bfocK and put the application 
process which has been in wait state into ready 
state (step L3). 

5 

FIG. 18 shows a flow diagram of a checkpoint 
acquisition. 

(1) Save the internal state of the CPU into the main 
memory (step Ml). io 

(2) Turn off the state setting up flag of each I/O 
device (step M2). Turn on the execution permission 
flag of each request block stored in the main mem- 
ory. 

(3) Clear data held in the memory state recovery is 
unit (step M3). 

FIGS. 19A through 19E show how the I/O requests 
are performed. Here, two application processes, a 
checkpoint acquisition, and two I/O execution proc- 20 
esses are related. 

When an I/O request is made by an application 
process, a conesponding request block is created and 
stored in the memory (FIGS. 19A and 19B). Since the 
execution permission flag of the request block remains 25 
OFF until a checkpoint is acquired, the I/O execution 
process stays in idle state. 

When a checkpoint has been acquired, the execu- 
tion permission flag of the request block is turned on 
(FIG. 19C). An I/O execution process takes a request so 
block whose execution permission flag is ON and exe- 
cutes the I/O operation (setting up the registers of the 
I/O device, starting the I/O device, and handling comple- 
tion intemjpt of the I/O device) (FIGS. 19D and 19E). 

FIG. 20 shows a flow diagram of a fault recovery 35 
and re-execution of an 1/0 operation. 

(1) !f the state setting up flag of an I/O device is ON 
or the in-operation flag is ON, initiaiize the I/O 
device (step N1). Turn off the state setting up flag 40 
and the in-operation flag. 

(2) Restore the state of the main memory to the 
state of the most recent checkpoint by using the 
memory state recovery unit (step N2). 

(3) With respect to only I/O devices which have 45 
been initialized at step N1. state setting up is per- 
formed again from the oldest to the newest with ref- 
erence to the log of the state setting up stored in the 
main memory (step N3). The state of the I/O device 

is recovered to the state of the most recent check- so 
point. 

(4) Initialize ongoing I/O execution processes (step 
N4). More specifically, the I/O execution processes 
are set to tine state of step LI irrespective of tine 
state of the I/O execution process at the most ss 
recent checkpoint. 

(5) Restarts data processing which was being per- 
formed at the most recent checkpoint (step N5). 



Assume that a fault occurs in the middle of an i/O 
operation. When a fault occurrs, the in-operation flag of 
ttie I/O device is on and therefore the I/O device is ini- 
tialized. The contents of the request block is rolled back 
to that of the most recent checkpoint by the memory 
state recovery unit Then, the state of the I/O device is 
recovered by re-performing the set up sequence held in 
the main memory (step N3). When the fault recovery 
step has been completed, an I/O process takes tiie 
request block and re-executes the I/O operation accord- 
ing to the request block from the beginning. 

TTiis is the way how an I/O operation interrupted 
halfway is re-performed after a fault recovery. 

When a checkpoint has been acquired, tiie request 
blocks which have been created since the preceding 
checkpoint turn executable. Therefore, it is appropriate 
to set the priority of the I/O execution processes higher 
after a checkpoint acquisition, so tiiat the delayed 1/0 
requests are executed immediately. 

To keep the cache hit ratio high, the CPU which 
executed an application process which made an I/O 
request should be assigned to the I/O execution proc- 
ess which is responsible for tiie request block created 
by tiie I/O request 

A preferable embodiment is as follows. 

A request block has a CPU identifier field. When tfie 
request block is aeated, the identifier of tiie CPU which 
executes tiie appllication process is written into the CPU 
identifier field. An I/O execution process takes a request 
block having tiie same CPU identifier with the CPU 
which executes the I/O execution process. 

The number of CPUs which are assigned to the I/O 
execution processes should be determined according to 
the number of tiie executable request blocks and ttie 
number of CPUs of the computer. If the number of ttie 
executable request blocks increases, more CPUs 
should be assigned to ttie I/O execution processes. One 
preferable embodiment is tiiat ttie scheduler of the com- 
puter determines whether an idle state CPU is assigned 
to an application process or an I/O execution process 
depending on the number of executable request blocks. 

(Third Embodiment) 

A computer system according to this embodiment 
comprises, in addition to the arrangement of the compu- 
ter system described in ttie second embodiment, a state 
holding area 36. 

In tills embodiment, attention is focused on an I/O 
device such as a printer. For a printer, when a fault 
occurrs in ttie computer while a slip is being printed, the 
slip may be left in an incomplete state (i.e., ttie printer 
cannot be restored to the state of the most recent 
checkpoint or finish printing the slip completely.) 

In order to detect such a state, an execution inter- 
ruption en-or flag is added to a request block in ttiis 
embodiment. This execution interruption error flag is 
used to identify whther ttie I/O opera tion designated by 
a request block results in an unrecoverable I/O en-or by 
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a fault occurrence in the computer system. 

FIG. 21 shows a flow diagram performed by an 
application process in this embodiment 

(1) Store parameters relating to an I/O operation in s 
the main memory as a request block (step 01), 
Turn off an execution permission flag of the request 
block, and turn off the execution interruption error 
flag of the request block. 

(2) Make transition to wait state until the I/O opera- 
tion designated in the request block is completed 
(step 02). 

(3) When the application process resumes its exe- 
cution, perform a completion step at the application 
process side relating to the 1/0 request with refer- 
ence to the result code field of the request block 
and then execute a succeeding step (step 03). 

FIG. 22 shows a flow diagram of an I/O execution 
process. 

(1) Wait for a request block whose execution per- 
mission flag is ON (step P1). 

(2) if the execution interruption error flag of the 
request block is ON, set an error code in the result 
code field of the request block, and set the applica- 
tion process which has been in wait state to ready 
state (steps P2. P5). 

(3) Otherwise, set up registers of an I/O device and 
turn on the in-operation flag in accordance with the 
request block (step P3). Turn on the execution inter- 
ruption error flag of the request block if the I/O 
device is a printer. 

(4) Upon receiving completion interrupt from the I/O 
device, turn off the in-operation flag, perform a 
completion step of the I/O request write resutt code 
in the request block, and put the application proc- 
ess which has been in wait state into ready state 
(step P4). Turn off the execution interruption error 
flag of the request block if the 1/0 device is a printer. 

FIG. 23 shows a flow diagram of a checkpoint 
acquisition in this embodiment. 

(1) Save the internal state of the CPU into the main 
memory (step 01). 

(2) Turn off the state setting up flag of each 1/0 
device (step Q2). Turn on the execution permission 
flag of each request block stored in tiie main mem- 
ory. 

(3) Clear data held in the memory state recovery 
unit (step 03). 

FIG. 24 shows a flow diagram of a fault recovery 
upon occurrence of a fault in this embodiment. 

(1) If the state setting up flag of an I/O device is ON 
or the in-operation flag is ON, initialize the 1/0 
device (step R1). Turn off tiie state setting up flag 



and the in-operation flag, (step R1) 

(2) Restore the state of the main memory to the 
state of the most recent checkpoint by using the 
memory state recovery unit. 

With respect to the execution interruption en-or 
flag of the request block conresponding to a printer, 
the value of the flag must be unchanged tiirough 
restoring the main memory This operation is real- 
ized, for instance, in the following way An ordinary 
computer system has an NVRAM (nonvolatile 
memory) for holding system parameters, and data 
update in the NVRAM can be controlled such that 
the state is not restored by the main memory recov- 
ery unit Therefore, when a fault occurrs, by saving 
the value of the execution interruption error flag in 
the NVRAM before the main memory restoration, 
and writing back the saved value into tiie flag after 
the main memory restoration, (step R2) 

(3) With respect to only an I/O device which has 
been initialized at step R1 . state set up sequence is 
re-performed from the oldest to the newest with ref- 
erence to the log of tiie state setting up held in the 
main memory (step R3). In this manner, the state of 
the I/O device is recovered to the state of the most 
recent checkpoint 

(4) Initialize ongoing 1/0 execution processes (step 
R4). More specifically the I/O execution processes 
are set to the state of step PI irrespective of tiie 
state of the t/0 execution process at tiie most 
recent checkpoint. 

(5) Restarts data processing which was being per- 
formed at the most recent checkpoint (step R5). 

When a fault occurs in the middle of a printer 1/0 
operation according tp a request block, the execution 
interaiption error flag is ON and it remains unchanged 
through the main memory restoration. TTien an 1/0 exe- 
cution process tries to re-execute the request block and 
it finds the execution interruption error flag is ON (at 
step P2). The 1/0 execution process, instead of re-exe- 
cuting the 1/0 operation, sets an error code in the result 
code field of the request block, and sets the application 
process into ready state. 

In this manner, when a printer 1/0 operation is inter- 
rupted halfway because of a fault ocojrrence, the 
printer 1/0 operation is not repeated again, but the appli- 
cation process copes with tiie error like a printer jam 
en-or. 

In case of a printer, it is more suitable to have the 
execution interruption error flag as a ternary flag, i.e.. 
completion/in-execution/non-execution. 

FIG. 25 shows a flow diagram performed by an 
application process in tiiis case. 

: (1) Store parameters with respect to an I/O opera- 
tion into the main memory as a request block (step 
Si). Turn off the execution permission flag of tiie 
request block, and set the execution interruption 
error flag of tiie request block to non-execution. 
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(2) Make transition to wait state until the I/O opera- 
tion descnbed in the request block is completed 
(step S2). 

(3) When the application process resumes its exe- 
cution, perform a completion step at the application 
process side relating to the I/O request with refer- 
ence to the result code field of the request block 
and then execute a succeeding step (step S3). 

FIG. 26 shows a flow diagram of an I/O execution 
process. 

(1) Wait for a request block whose execution per- 
mission flag is ON (step T1). 

(2) If the execution interruption en^or flag of the 
request block is in-execution. set an error code in 
the result code field of tiie request block, and set 
the application process which has been in wait 
state to a ready state (steps T2. T6). 

(3) OthenAnse. if the execution interruption en-orflag 
is conrtpletion, set a completion code in the result 
code field of tiie request block, and set tiie applica- 
tion process which has been in wait state to a ready 
state (steps T3. T7). 

(4) Otherwise, set up registers of an I/O device and 
turn on the in-operation flag in accordance with the 
request block (step T4). 

(5) Upon receiving conrpletion interrupt from the I/O 
device, turn off the in-operation flag, set the execu- 
tion interruption error flag to completion, and set the 
application process which has been in wait state to 
ready state (step T5). 

When a fault occurs after the end of a printer I/O 
operation, tiie execution interruption en-or flag is com- 
pletion and it remains unchanged through tiie main 
memory restoration. Then an I/O execution process 
tries to re-execute the request block and it finds the exe- 
cution inten'uption error flag shows completion (at step 
T3). The I/O execution process, instead of re-executing 
the I/O operation, sets a termination code in the result 
code field of the request block, and sets the application 
process into a ready state. 

In this manner, when a printer I/O operation has 
been completed before the occurrence of a fault in the 
computer, the printer I/O operation is not repeated 
again, but the application process receives a result 
code. 

Claims 

1. An I/O control apparatus in a computer system 
which has one or more CPUs (la, 1b), a main 
memory (3), and one or more I/O devices (4a, 4b) 
and characterized in tiiat said CPUs periodically 
save the internal state of said CPUs and the con- 
tents of said main menxjry as a checkpoint, and the 
internal state of said CPUs and the contents of said 
main memory of the most recent checkpoint are 



restored when a fault occurs in said computer sys- 
tem to restart data processing, comprising: 

I/O device state storing means (34, A1 . A2, 1 a) 
5 for storing log data of state setting of said 1/0 

devices performed by said CPUs; and 
I/O device state restoring means (1a, CI, C2. 
C3) for restoring ttie state of said I/O devices to 
tiiat of tiie most recent checkpoint by first ini- 
10 tializing said I/O devices and second replaying 

state setting up sequence according to said log 
data stored by said I/O device state storing 
means. 

IS 2. An apparatus according to claim 1 , characterized in 
that said storing means includes means (la. C2) for 
erasing part of tiie existirig log data which is made 
unnecessary by setting up new state. 

20 3. An apparatus according to claim 1 , characterized in 
tiiat said I/O device state restoring means includes 
means (1a, F1) for skipping initializing and replay- 
ing state setting up sequence of 1/0 device charac- 
terized in that new state setting has not been 

25 performed since tiie most recent checkpoint 

4. An apparatus according to claim 1. characterized 
by further conrprising: 

30 request block creating means (35) for creating. 

when an application process in said computer 
system makes an I/O request, a request block 
in said main memory which contains informa- 
tion necessary to perform said I/O request; 

35 I/O execution process (32. FIG. 4, H1, H2, H3 

in FIG. 12) for performing I/O operation by exe- 
cuting I/O device driver routines according to a 
request block; 

I/O execution processes initializing means (J4 
40 in FIG. 1 5) for initializing, upon restart from tiie 

most recent checkpoint after a ^ult occur- 
rence, said I/O execution processes otiier tiian 
in initial state and causing I/O operations being 
performed by said I/O execution processes to 
45 be performed again from the beginrifhg. 

5. An apparatus according to claim 2. characterized 
by further conrprising: 

50 request block aeating means (35) for creating, 

when an application process in said computer 
system makes an 1/0 request, a request block 
in said main memory which contains informa- 
tion necessary to perform said I/O request; 
55 . : 1/0 execution processes (32, FIG. 4. HI, H2, 
H3 in FIG. 12) for performing an I/O operation 
by executing I/O device driver routines accord- 
ing to a request block; 

I/O execution process initializing means (J4 in 
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FIG. 15) for initializing, upon restart from the 
most recent cfiecKpoint after a fault occur- 
rence, said I/O execution processes other than 
in initial state and causing I/O operations being 
performed by said I/O execution processes to 5 
be performed again from the beginning. 

6. An apparatus according to claim 3, characterized 
by furtiier comprising: 

70 

request block creating means (35) for creating, 
when an application process in said computer 
system makes an I/O request a request block 
in said main memory which contains informa- 
tion necessary to perform said I/O request: is 
I/O execution processes (32, FIG. 4, HI, H2, 
H3 In FIG. 12) for performing an I/O operation 
by executing I/O device driver routines accord- 
ing to a request block; 

I/O execution processes inrtializing means (J4 20 
in FIG. 15) for initializing, upon restart from the 
most recent checkpoint after a fault occur- 
rence, said I/O execution processes other than 
in initial state and causing I/O operations being 
performed by said I/O execution processes to 2S 
be performed again from the beginning. 

7. An apparatus according to claim 4, characterized in 
that, of request blocks held in said main memory, 
said I/O execution processes begin to perform an so 
I/O operation according to a request block created 
before the most recent checkpoint, while said I/O 
execution processes postpone an I/O operation 
according to a request block created after the most 
recent checkpoint until a new checkpoint acquisi- 3S 
tion. 

8. An apparatus according to claim 5, characterized in 
that, of request t}locks held in said main memory, 
said I/O execution processes begin to perform an 4C 
I/O operation according to a request block aeated 
before tiie most recent checkpoint while said I/O 
execution processes postpone an I/O operation 
according to a request block created after the most 
recent checkpoint until a new checkpoint acquisi- 45 
tion. 

9. An apparatus according to claim 6. characterized in 
that, of request blocks held in said main memory, 
said I/O execution processes begin to perform an so 
I/O operation according to a request block created 
before tiie most recent checkpoint while said I/O 
execution processes postpone an I/O operation 
according to a request block created after the most 
recent checkpoint until a new checkpoint acquisi- 55 
tion. 

10. An apparatus according to claim 7, characterized in 
that said CPUs are assigned to said I/O execution 



processes when a new checkpoint acquisition has 
been completed. 

11. An apparatus according to claim 8, characterized in 
tiiat said CPUs are assigned to said I/O execution 
processes when a new checkpoint acquisition has 
been completed. 

1Z An apparatus according to claim 9, characterized in 
that said CPUs are assigned to said I/O execution 
processes when a new checkpoint acquisition has 
been completed. 

1 3. An apparatus according to claim 4, characterized in 
that ttie CPU which executes an application proc- 
ess which made an I/O request also executes an 
1/0 execution process which is responsible for tiie 
request block created based on said I/O request. 

14. An apparatus according to claim 5. characterized in 
tiiat tiie CPU which executes an application proc- 
ess which made an I/O request also executes an 
I/O execution process which is responsible for the 
request block created based on said I/O request 

15. An apparatus according to claim 6, characterized in 
that tiie CPU which executes an application proc- 
ess which made an I/O request also executes an 
I/O execution process which is responsible for tiie 
request block created based on said I/O request. 

1 6. An apparatus according to claim 4, characterized in 
tiiat tiie number of CPUs which are assigned to 
said I/O execution processes is properly deter- 
mined depending on the number of request blocks 
to be processed. 

17. An apparatus according to claim 5. characterized in 
that tiie number of CPUs which are assigned to 
said t/0 execution processes is properly deter- 
mined depending on the number of request blocks 
to be processed. 

ia An apparatus according to claim 6, characterized in 
that the number of CPUs which aii assigned to 
said I/O execution processes is properly deter- 
mined depending on the number of request blocks 
to be processed. 

19. An apparatus according to claim 4, characterized 
by further comprising means (P2 in FIG. 22) for 
making, when a fault occurs, an error reply to the 
application process without re-executing tiie 
requested I/O operation, in case ttiat said 1/0 

: device state restoring means does not manage to 
restore the state of tiie I/O device which relates to 
said 1/0 request. 

20. An apparatus according to claim 5, characterized 
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by further comprising means for making, when a 
fault occurrs, an error reply to the application proc- 
ess without re-executing the requested I/O opera- 
tion, In case that said I/O device state restoring 
means does not manage to restore the state of the 
I/O device which relates to said I/O request. 

21. /Vn apparatus according to claim 6, characterized 
by further comprising means for making, when a 
fault occurs, an error reply to the application proc- 
ess without re-executing the requested I/O opera- 
tion, in case that said I/O device state restoring 
means does not manage to restore the state of the 
I/O device which relates to said I/O request. 

22. An apparatus according to claim 7, characterized 
by further conprising means for making, when a 
fault occurs, an enror reply to the application proc- 
ess without re-executing the requested I/O opera- 
tion, in case that said I/O device state restoring 
means does not manage to restore the state of the 
I/O device which relates to said I/O request. 

23. An apparatus according to claim 8. characterized 
by further comprising means for making, when a 
fault occurs, an error reply to the application proc- 
ess without re-executing the requested I/O opera- 
tion, in case that said I/O device state restoring 
means does not manage to restore the state of the 
I/O device which relates to said I/O request. 

24. An apparatus according to daim 9, characterized 
by further conprising means for making, when a 
fault occurs, an en^or reply to the application proc- 
ess without re-executing the requested I/O opera- 
tion, in case that said 1/0 device state restoring 
means does not manage to restore the state of the 
I/O device which relates to said I/O request. 

25. An apparatus according to claim 4, characterized 
by further comprising means for making, when a 
fault occurs, an successful I/O completion reply to 
the application process without re-executing the 
requested I/O operation, in case that the t/0 
request is an output request and the I/O request 
has been completed before the fault occurrence. 

26. An apparatus according to claim 5, characterized 
by further comprising means for making, when a 
fault occurs, an successful I/O completion reply to 
the application process without re-executing the 
requested I/O operation, in case that the 1/0 
request is an output request and the I/O request 
has been completed before the fault occun-ence. 

27. An apparatus according to claim 6, characterized 
by further comprising means for making, when a 
fault occurs, an successful I/O completion reply to 
the application process without re-executing the 



requested I/O operation, in case that the 1/0 
request is an output request and the 1/0 request 
has been conopleted before the fault occurrence. 

5 28. An apparatus according to claim 7, characterized 
by further comprising means for making, when a 
fault occurs, an successful I/O conpletion reply to 
the application process without re-executing the 
requested I/O operation, in case that the I/O 

10 request is an output request and the I/O request 
has been completed before the fault occurrence. 

29. An apparatus according to claim 8, characterized 
by further comprising means for making, when a 

IS feutt of said conoputer system occurs, an successful 
I/O completion replay to the I/O requesting process 
without re-executing the requested I/O operation, in 
case that the I/O request is an output request and 
the 1/0 request has been completed before the fault 

20 occun-ence. 

30. An apparatus according to claim 9, characterized 
by further comprising means for making, when a 
fault occurs, an successful (/O completion reply to 

25 the application process without re-executing the 
requested I/O operation, in case that tiie I/O 
request Is an output request and the I/O request 
has been completed before the fault occurrence. 

30 31. An I/O control metiiod in a computer system which 
has one or more CPUs, a main menxsry, and one or 
more I/O devices and characterized in that said 
CPUs periodically save the internal state of said 
CPUs and the contents of said main memory as a 

35 checkpoint, and the ifitemal state of said CPUs and 
tiie contents of said main memory of the most 
recent check pint are restored when a fault occurs 
in said computer system to restart data processing, 
comprising: 

40 

storing (A1 in FIG. 5) log data of state setting of 
said I/O devices performed by said CPUs; and 
restoring (B1 in FIG. 6) the state of said I/O 
devices to tfnat of the most recent checkpoint by 
45 first initializing said I/O devices and second 

replaying state setting up sequence according 
to said stored log data 

32. An article of manufacture comprising: 

50 

a computer usable medium having computer 
readable program code means embodied 
therein for causing statuses of input and output 
(I/O) units to be restored to respective check- 
55 . points when a computer system is restarted 

from occun^ence of fault, the computer reada- 
ble program code means in said article of man- 
ufacture comprising: 

computer readable program code means for 
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causing a computer to save log data statuses 
upon setting statuses including an operation 
mode with respect to the t/0 devices, and 
conputer readable program code means for 
causing a computer to initialize the I/O devices 5 
and set states of the I/O devices in accordance 
with the saved log data, upon occun^ence of a 
fault in the computer system. 

33. An article of manufacture comprising: io 

a conrputer usable medium having computer 
readable program code means embodied 
therein for causing statuses of input and output 
(I/O) units to be recovered to respective check- is 
points when a computer system is restarted 
from occurrence of fault, the computer system 
having one or more CPUs, a main memory, and 
one or more I/O devices and periodically sav- 
ing the internal state of said CPUs and the con- 20 
tents of said main memory as a checkpoint 
and the internal state of said CPUs and con- 
tents of said main memory of the most recent 
checkpoint being restored when a fault occurs 
In said computer system to restart data 25 
processing, the computer readable program 
code means In said article of manufacture com- 
prising: 

computer readable program code means for 
causing a computer to save log data states oo 
upon setting statuses including an operation 
mode with respect to the I/O devices; 
conputer readable program code means for 
causing a computer to initialize the t/0 devices 
and set statuses of the I/O devices in accord- 35 
ance with the same log data, upon occurrence 
of a fault in the computer system; 
computer readable program code means for 
causing a computer to create, when a process 
in said computer system makes an 1/0 request, 40 
a request block in said main memory which 
contains information necessary to perform said 
1/0 request: 

computer readable program code means for 
causing a computer to perform an i/0 operation 45 
by accessing said t/0 devices according to a 
request block; 

computer readable program code means for 
causing a computer to initialize, upon restart 
from the most recent checkpoint which follows so 
a fault occurrence, said I/O execution proc- 
esses in execution state and causing I/O oper- 
ations being performed by said t/0 execution 
processes to be performed again from the 
beginning. .55 
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